Self-supervised learning: see video-to-image in this blog.
predict optical flow and use two-stream network [1]
Predicting pose information (use poselet detector) [2]
Reference:
[1] Gao, Ruohan, Bo Xiong, and Kristen Grauman. “Im2flow: Motion hallucination from static images for action recognition.” CVPR, 2018.
[2] Chen, Chao-Yeh, and Kristen Grauman. “Watching unlabeled video helps learn new human actions from very few labeled snapshots.” CVPR, 2013.